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SECTION  I 


INTRODUCTION 

The  use  of  theory  to  obtain  collision  cross  sections  from  electron  transport  data,  one 
of  the  "inverse  problems"  of  physics,  was  pioneered  by  Townsend  and  by  Ramsauer  in  the 
1920's.  The  method  used  in  such  early  analyses  involved  measuring  the  drift  velocity  of 
electrons  in  a  gas  as  a  function  of  E/p  (electric  field  strength  divided  by  gas  pressure)  and 
inverting  the  integral  relating  the  drift  velocity  and  the  momentum  transfer  cross  section 
using  an  approximate  expression  for  the  energy  distribution  of  the  electrons.  This 
technique  has  increased  in  sophistication  over  the  years.  In  the  1960's  Phelps  and  various 
collaborators  applied  electronic  computation  to  the  problem  and  developed  algorithms  for 
solving  the  Boltzmann  equation  for  transport  of  electrons  in  a  weakly  ionized  plasma  to 
obtain  an  accurate  electron  energy  distribution  function  valid  at  higher  fields  and  in  the 
presence  of  inelastic  and,  even,  superelastic  collisions.  This  began  an  era  that  has  given  us 
very  accurate  momentum  transfer  and  lower  energy  (rotational  and  vibrational)  inelastic 
cross  sections  that  have  been  derived  from  measurements  of  the  drift  and  diffusion  of 
electrons  in  gases.  This  methodology  is  reviewed  in  Refs.  [1-4].  This  has  become  an 
increasingly  active  field  in  recent  years  due,  for  example,  to  the  desire  for  cross  sectional 
data  on  molecules  such  as  CH^,  CF^,  SFg,  SiH^,  and  SiF^  that  are  used  in  semiconductor 
plasma  processing  and  in  switching  applications. 

Clearly  the  iterative  process  of  choosing  energy  dependences  of  test  cross  sections; 
solving  the  Boltzmann  equation  for  a  range  of  values  of  electric  field;  computing  transport 
coefficients;  comparing  to  measured  values;  revising  the  test  cross  sections;  etc.  is  very 
labor  intensive  and  "hands  on."  It  is  obviously  a  process  where  the  experience  of  the 
researcher  plays  an  important  role  comparable  to  that  of  the  specific  computational 
techniques  used.  The  object  of  the  research  being  reported  on  here  is  to  evaluate  several 
computational  methods  for  reducing  the  labor  involved  in  this  deconvolution  process. 
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SECTION  II 


BOLTZMANN'S  EQUATION  AND  ELECTRON  SWARMS 

The  pertinent  equations  in  this  problem  are  the  electron  Boltzmann  equation  and  its 
energy  integral,  which  relates  the  various  transport  coefficients  for  electrons  in  a  gas.  We 
can  see  how  the  various  aspects  of  this  problem  are  related  to  each  other  by  examining  the 
so  called  two-term  expansion  of  the  Boltzmann  equation  and  the  various  electron  transport 
coefficients.  If  we  take  the  general  form  for  the  Boltzmann  equation, 

(a/at  +  v.Vr  +  s£.Vv)f(r,v,t)  =  (»/«t)coiiisio„s 

neglect  spatial  and  temporal  dependence  of  the  distribution  function  f(r,v,t),  and  express 
f=f(v)  as  the  first  two  terms  of  a  spherical  harmonic  expansion,  that  is 

f(v)  =  f0(v)  + 

✓ 

9 

then  we  obtain  the  following  scalar  equation  for  fg(c)  (where  e=mv  /2): 

r<eE/N)2  d/d({e/v  dfQ/d<}  +  d/de{(2m%/M)(2[f0(£)  +  kT  df0/d<]} 

+E  [(f+fj^e+CjK^e+Cj)  -  e^(0fo(0]  =  0  0) 

Here  we  have  assumed  that  the  populations  of  the  excited  levels,  lal3led  by  i,  are  small 
enough  that  superelastic  collisions  and  transitions  among  excited  states  are  unimportant. 
The  electron  impact  cross  sections  involved  are  am,  the  momentum  transfer  cross  section, 
and  {<7j},  the  set  of  cross  sections  for  transitions  from  the  ground  state  to  the  various 
excited  states  { i } .  This  equation  does  a  remarkably  good  job  of  describing  the  transport  of 
electrons  under  the  influence  of  an  electric  field  in  most  gases. 
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The  Boltzmann  equation,  via  the  probability  density  function  fg(  e),  is  a  microscopic 
description  of  the  behavior  of  electrons  in  a  gas.  We  need  to  relate  fg(e)  to  some 
macroscopic  quantities  that  can  be  measured.  This  is  done  by  performing  an  energy 
integral  of  (1),  which  gives  the  following: 


vd  E/N  -  v^2e/m  62(2m/M)<rm(e)[f0(0+kT-df0/df]d< 


=  f  iki 


(2) 


The  drift  velocity  times  the  electric  field  divided  by  the  gas  number  density  is  the  power 
input;  the  second  term  is  the  power  lost  due  to  elastic  collisions  (which  is  reduced  by 
recoil);  and  the  RHS  of  the  equation  is  the  power  lost  due  to  inelastic  collisions.  The  drift 
velocity,  v^,  and  the  elastic  collision  term  contain  the  momentum  transfer  cross  section, 
while  all  the  terms  involve  integrals  over  the  electron  energy  distribution  function,  fg(t), 
itself  a  function  of  the  cross  sections.  The  two  most  commonly  measured  transport 
coefficients  are  the  drift  velocity,  v^,  and  the  tranverse  diffusion  coefficient,  Drp  which  are 
related  to  fQ(f)  and  the  momentum  transfer  cross  section,  am(f)  by  the  following: 

vd  *  /km(<)rH/cle)«le  and  DT  a  /[^(e^yejedt 

We  see  that  the  drift  velocity  and  diffusion  coefficent  sample  different  aspects  of  f(t)  and, 
hence,  represent  two  somewhat  independent  pieces  of  information.  Generally  the  quantity 
DT//i,  the  characteristic  energy,  is  reported  in  the  literature,  rather  than  itself.  For  a 
Maxwellian  distribution  of  electrons,  where  fg(f)  a  exp(— f/kTg),  for  which  an  electron 
temperature,  T0,  can  be  defined,  the  Einstein  relation  DT/n  =  kTg  =  2<r>/3  holds.  Since 
the  mean  electron  energy  < e>  is  not  a  measureable  quantity  (it  is  usually  computed  by 
solving  Boltzmann's  equation),  the  characteristic  energy  is  generally  the  only  measure  of 
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electron  energy  that  we  have.  We  see  that  comparison  with  measured  Dr values  gives 
us  another  constraint  on  the  cross  sections  as  does  comparison  with  measured  rate 
coefficients,  «  / cr-( 6)fQ(e)cde,  and  spectral  data  where  they  are  available. 

The  relationship  between  the  cross  sections  and  the  transport  coefficients  via  the 
distribution  function  fQ(f)  is  highly  nonlinear.  We  have  a  mapping 


f  V‘> ) 

vd(E/N)  ' 

======> 

{ 

D/ME/N)  l 

{*}(<)} 

{kjfE/N)}  1 

and  we  want  to  find  the  reverse  mapping  given  the  transport  data.  It  has  been  claimed  in 
the  literature  (see  Ref.  5,  for  example)  that  the  reverse  mapping  is  not  unique,  but  we  have 
never  seen  it  proven.  It  seems  likely  that  the  more  transport  data  we  have  available,  the 
more  likely  it  is  that  the  reverse  mapping  is  going  to  be  unique. 
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SECTION  III 


TECHNIQUES  FOR  RECOVERING  CROSS  SECTIONS  FROM  SWARM  DATA 

There  are  numerous  techniques  ^  that  might  be  used  in  inverting  electron 
transport  data  to  obtain  a  collision  cross  section.  The  three  classes  of  methods  that  we 
discuss  here  are  (A)  the  downhill  or  creeping  simplex  algorithm,  which  is  a  topological 
approach;  (B)  function  minimization  by  simulated  annealing ,  a  statistical  approach;  and 
(C)  neural  networks ,  which  do  not  fit  into  any  of  the  usual  categories  for  numerical 
algorithms.  The  latter  are  very  new  and  largely  unknown  in  applications  to  physical 
problems.  Descriptions  of  these  approaches  follow  below.  Initially  we  applied  all  three 
methods  to  a  model  problem  in  ord  r  to  develop  the  algorithms  and  codes. 

Another  approach  10  solving  inverse  and  so-called  "missing  information"  problems 
is  the  maximum  entropy  method.  This  is  a  method  of  statistical  inference  that  provides  a 
least  biased  estimate  based  upon  given  information.  Using  the  information  theoretical 
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definition  of  entropy  (which  it  is  our  desire  to  maximize)  one  sets  up  a  likelihood  function 
that  is  a  linear  combination  of  the  entropy  function  and  contraints,  which  are  modified  by 
Lagrange  multipliers.  Such  a  constraint  may  be  the  good ness-of— fit  criterion,  for  example. 
One  then  maximizes  the  entropy  subject  to  the  constraints.  This  then  yields  a 
transcendental  equation  that,  in  principle,  can  be  iterated  upon  to  yield  an  estimate  of  the 
unknown  function.  Examples  of  the  use  of  this  approach  to  astrophysical  problems, 
intermolecular  potentials  in  solid  state  physics,  and  signal  analysis  can  be  found  in  Refs. 
[7,9,10].  We  believe  that  the  maximum  entropy  condition  is  implicit  in  this  problem 
through  the  use  of  the  Boltzmann  equation,  which  is  the  equation  that  maps  the  cross 
sections  into  the  transport  coefficients.  The  equilibrium  and  steady  state  solutions  of  the 
Boltzmann  equation  are,  of  course,  maximum  entropy  solutions  as  can  be  seen  from  the 
behavior  of  the  II— function,  where  dll/dt  <  0  by  Boltzmann's  H—  theorem.  The  entropy  (at 
equilibrium,  of  course)  is  directly  related  to  the  H— function  via  II  =  -S/kV  and, 
consequently,  is  maximized. 
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The  two  obvious  examples  to  use  as  models  are  those  of  electrons  drifting  in  a  gas 
having  either  constant  collision  frequency  or  constant  cross  section.  Here  the  electron 
energy  distributions  are  Maxwell— Boltzmann  and  Druyvesteyn,  respectively,  with  easily 
calculable  E/N  dependent  drift  velocities.  Any  method  should  be  able  to  use  the  v^(E/N) 
curve  and  the  appropriate  fQ(e)  and  recover  or  o^crQ  respectively  for  these  two 

special  cases. 

We  have  investigated  the  capability  of  these  optimization  algorithms  to  reproduce 
the  constant  collision  frequency  cross  section,  a  =  <tq/ -/t,  using  the  drift  velocity,  v^E/N), 
and  characteristic  energies,  D//z(E/N),  associated  with  that  cross  section.  The  resulting 
electron  energy  distribution  function,  fg( e) ,  v  ,,  D//i,  and  <r>  are  all  analytic  functions!1 


A.  THE  CREEPING  SIMPLEX 

This  is  a  very  versatile  method  for  optimization  problems.  In  finding  the  minimum 
of  a  function  of  n  variables,  F(xp...,xn),  we  can  think  of  the  set  {xj}  as  defining  an 
n-dimensional  surface  in  a  space  of  n+1  dimensions,  n+1  points  on  this  surface  then 
define  what  is  called  a  simplex.  If  one  draws  a  picture  of  the  surface  defined  by  F^^), 
as  shown  in  Fig.  1,  it  is  easy  to  see  that  this  simplex  is  a  triangle,  i.e.,  three  points 
determine  the  two  lines  that  define  a  plane  in  three  dimensions.  Now,  using  several 
transformation  rules  this  simplex,  or  n-dimensional  plane,  can  be  made  to  move  around  on 
the  surface  and,  specifically,  can  be  made  to  follow  the  contours  of  the  surface  moving  ever 
"downward"  toward  the  lowest  point.  This  algorithm  was  first  published  by  Nelder  and 
Mead*“  but  Press,  et  al.®  give  a  good  description  of  it. 

My  implementation  of  the  creeping  simplex  involves  defining  the  initial  simplex  by 
choosing  n  t-1  trial  cross  sections  of  the  form  a{t)=ajt P,  where  the  constant  is  chosen 
from  a  uniform  distribution  of  random  numbers  in  10  1  ^<ao<10— 1-1  cm^  and  the  power  p 
is  '•hosci;  from  uniform  random  numbers  in  0<p<l.  Using  these  cross  sections  the 
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Figure  1:  Example  of  a  2-dimensional  simplex 
on  a  3-dimensional  surface 


appropriate  f(f)'s  are  computed  and,  consequently,  the  set  of  drift  velocities,  w(E/N),  for  a 
number  of  values  of  E/N.  The  function  to  be  minimized  then  is  a  x  on  the  difference 
between  the  "data"  v^(E/N)  and  the  n+1  trial  w(E/N),  that  is,  E  [(vjj  -  w^)/vjj].  With 
good  convergence  properties,  the  simplex  reduces  nearly  to  a  point  on  the  surface,  so  that 
the  final  result  is  the  cross  section  associated  with  the  final  simplex.  In  Fig.  2  we  see  the 
results  of  using  this  procedure  on  the  test  problem.  This  calculation,  which  took  about  20 
minutes  on  a  25  MHz  80386  PC,  used  5  values  of  E/N  (1—10  Td)  and  13  cross  section 
points  running  from  0.01  eV  in  powers  of  2.  We  see  that  the  results  look  excellent  except 
at  the  largest  energies,  where  a  (c)  is  insensitive  to  the  range  of  E/N  used. 

Application  to  Real  Gases 

We  have  used  the  simplex  algorithm  to  recover  the  momentum  transfer  cross 
sections  for  He,  Ar,  and  CH^  from  the  their  E/N  dependent  transport  coefficients.  The 
results  are  shown  in  Figs.  3  through  6. 

The  He  calculations  used  the  drift  velocity  and  characteristic  energy  measurements 

q 

given  in  the  book  by  Huxley  and  Crompton.  Eleven  values  each  of  v^E/N)  and 
D//r(E/N)  were  used  with  0.1  Td  <  E/N  <  3  Td.  In  the  calculations  on  real  gases,  the 
power  parameter  in  the  trial  cross  sections  was  in  range  -l<p<  +  l,  rather  than  in  (0,1)  as 
in  the  model  calculation.  The  orginal  transport  data  are  cited  by  Huxley  and  Crompton  as 
coming  from  Refs.  13  and  14.  Shown  for  comparison  is  the  He  momentum  transfer  cross 
section  of  Crompton,  et  al.  "  Note  that  this  cross  section  was  derived  from  different  drift 
velocity  data  (at  77  K  rather  than  300  K)  and  without  use  of  D ///  data,  so  we  expect  the 
resulting  cross  section  to  be  somewhat  different  from  that  which  would  be  derived  from  the 
data  used  here.  We  see  reasonable  agreement  except  at  the  extremes  in  energy.  This,  as 
has  Ijeen  discussed  above,  is  due  to  the  limited  range  of  E/N  used.  These  calculations  take 
about  2  hours  on  an  80386  PC.  Better  accuracy  over  the  energy  range  would  likely  be 
achieved  with  a  larger  number  of  values  of  v^E/N)  and  D//r(E/N). 
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Momentum  transfer  cross  section  from  drift  velocity  data 
for  creeping  simplex  and  simulated  annealing  algorithms 
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Figure  3:  Downhill  simplex  results  for  He 


A  similar  calculation  performed  for  Ar  is  shown  in  Fig.  4.  Twelve  values  of  E/N 
were  used  in  the  range  0.002  Td  <  E/N  <  0.1  Td,  which  are  quite  low  E/N  values.  The  90 
K  transport  data  came  from  Ref.  16  (v^)  and  Ref.  17  (D/p),  as  cited  by  Huxley  and 
Crompton.  Shown  for  comparison  are  the  momentum  transfer  cross  sections  for  Ar  found 
by  Frost  and  Phelps  18  and  by  Milloy,  Crompton,  et  al.19  Neither  of  these  cross  sections 
was  derived  from  the  transport  coefficients  used  in  this  calculation,  so  we  expect  there  to 
be  differences.  This  calculation  is  meant  only  to  indicate  the  possibilities  of  what  might  be 
achievable  with  further  development  of  this  approach.  Interestingly,  we  see  that  the  <rm(f) 
from  the  optimization  algorithm  agrees  with  Milloy,  et  al.  below  the  Ramsauer  minimum 
and  with  Frost  and  Phelps  above  the  minimum.  At  high  energy  we  have  the  ususal 
problem,  which  would  be  taken  care  of  by  more  E/N  values  over  a  larger  range. 

The  most  sophisticated  calculation  was  on  methane,  CH^.  Methane  has,  in  addition 
to  a  Ramsauer  minimum  in  <rm(f),  low  energy  inelastic  cross  sections,  i.e.,  vibrational 
levels  with  energy  losses  of  0.162  and  0.361  eV.  We  used  the  same  approach  as  described 
above  with  12  values  of  E/N  in  the  range  from  0.1  to  12  Td  and  consistent  v^E/N)  and 
D/p(E/N)  data  as  measured  by  Haddad.^9  The  momentum  transfer  cross  section  was 
computed  at  12  energy  values  and  the  vibrational  cross  section  at  10.  Since  this  work  is 
developmental  in  nature,  We  used  only  one  vibrational  state  (0.162  eV)  so  that  <ry(c) 
approximately  represents  the  sum  of  the  two  actual  vibrational  cross  sections.  This 
approach  has  been  used  in  other  analyses,  notably  those  of  Pollock  and  of  Pitchford,  et 
al.  We  are  still  using  the  two-term  expansion  of  the  Boltzmann  equation  even  though  it 
is  known  that  there  is  a  loss  of  accuracy  for  methane  and  other  gases  that,  in  one  way  or 
another,  do  not  totally  satisfy  the  conditions  used  in  the  development  of  this 
approximation.  The  results  of  using  the  creeping  simplex  on  CH^,  which  required  3  hours 
of  computational  time  on  the  PC,  are  shown  in  Figs.  5  and  6  where  we  see  ^m(f)  and  <rv(e) 
respectively  along  with  numerous  cross  sections  derived  by  other  authors  from  swarm 
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Figure  4:  Downhill  simplex  results  for  Argon 
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Figure  5:  CH.  momentum  transfer  cross  sections 
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data.^'*-^  The  first  olxirvalion  tliat  we  make  is  that  there  is  anything  but  unanimity  in 
what  constitutes  the  "best"  or  "correct"  cross  sections.  Clearly  the  creeping  simplex 
results  are  as  valid  as  any  of  the  others  that  we  see  in  the  Figures.  We  see  from  the  plots 
of  drift  velocities  and  characteristic  energies  respectively  in  Figs.  7  and  8  that  these  cross 
sections  yield  satisfactory  transport  coefficients.  Clearly  the  next  step  in  this  development 
would  be  to  extend  this  algorithm  to  multiple  inelastic  process,  such  as  two  vibrational 
levels  in  CH^  or,  more  ambitiously,  Hg  with  rotational  and  vibrational  levels. 


B.  SIMULATED  ANNEALING 

7  97 

Simulated  annealing  ’  ’  ’  is  a  function  minimization  method  that  is  an 

OO 

outgrowth  of  the  Metropolis  algorithm  commonly  used  for  computer  simulation  of 
canonical  ensembles  in  statistical  mechanics.  One  minimizes  a  quantity  E  by  making 
random  changes  in  the  configuration  of  the  system  and  deciding  whether  or  not  to  accept 
the  new  configurations  based  on  comparison  with  the  Boltzmann  probability  P(E)  = 
exp(-E/kT),  where  T  is  a  control  parameter.  When  applying  the  method  to  a 
thermodynamic  system,  say  a  collection  of  atoms  at  temperature  T,  one  displaces  an  atom 
at  random  (the  Monte  Carlo  move)  and  computes  the  total  energy  E  of  the  system.  If  it 
has  decreased,  the  move  is  accepted  and  another  MC  move  is  made.  If  it  has  increased,  the 
move  is  accepted  with  probability  e_AE/kT  This  allows  the  system  a  means  of  moving 
out  of  a  local  minimum  if  it  has  settled  into  one.  The  annealing  part  of  the  method 
involves  slowly  decreasing  T  as  the  simulation  proceeds.  This  approach  is  applicable  to 
very  large  systems  and  has  been  very  successful  in  providing  near  optimal  solutions  to  the 
so-called  "traveling  salesman  problem." 

We  designed  the  following  algorithm  to  perform  the  Monte  Carlo  moves  appropriate 

to  this  problem.  We  have  values  of  cross  sections,  a-  =  a  (e.),  at  some  number  of  values 

J  mv  y 

of  energy,  fj.  In  the  calculation  presented  here  there  are  13  energy  points,  i.e.,  l<j<13. 
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Figure  8:  Characteristic  energies 


object  of  the  algorithm  is  to  vary  these  in  the  manner  described  above  in  order  to  minimize 
o 

the  same  \  function  that  was  used  in  the  creeping  simplex  method.  To  perform  this 
variation  a  j  is  chosen  using  a  random  number  generator  and  then  is  multiplied  by  some 
scale  factor.  Let  F°  be  the  maximum  fractional  change  that  we  allow  the  program  to  make 
in  a  <7j  in  a  single  MC  move  and  let  t)  be  a  uniform  random  number  in  (0,1))  then 
crj"*"1  =  aj  •  [1  +  (2-  t;-F°  -  F°)]  is  the  formula  for  obtaining  the  desired  random  variation 
of  Oj  from  iteration  to  interation.  In  order  to  allow  a  smooth  variation  of  the  function 
am(f)  we  devised  a  "rubber  ruler"  algorithm  by  which  <7j  is  varied  as  just  described  but 
smaller  variations  are  made  simultaneously  in  the  other  cross  sections  lying  nearby  in 
energy.  The  complete  MC  move  algorithm  is 

<Tk+1  “  ffk  ’  <'  + 

where  j  is  chosen  as  above  and  e  is  a  scale  energy,  which  was  0.2  eV  is  the  calculations  to 

b 

be  described. 

We  have  applied  this  algorithm  to  the  test  problem  described  above  with  the  results 

that  are  shown  in  Fig.  2.  In  this  calculation  the  variations  in  the  trial  cross  section  were 

made  as  described  above.  In  addition,  the  "temperature"  kT  =  0.005  and  the  annealing 

schedule  was  such  that  kT  was  multiplied  by  0.5  after  130  successful  MC  moves  had  been 

made;  this  was  carried  on  for  50  iterations.  This  particular  calculation  took  25  minutes  of 

CPU  time  on  a  Cray  X/MP.  Although  these  results  are  not  as  spectacular  as  those 

obtained  with  the  simplex,  we  believe  this  method  is  worth  further  study.  The  choice  of 

the  appropriate  T  and  its  annealing  schedule  is  very  much  a  matter  of  trial  and  error  and 
27 

experience  “  and  this  sample  calculation  is  certainly  not  optimized.  In  addition,  it  is  easy 
to  see  how  to  implement  this  algorithm  for  any  number  of  elastic  and  inelastic  processes, 
and  how  to  use  prior  information  on  the  uncertainty  associated  with  each  cross  section  in 
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performing  the  MC  variations.  Therefore,  this  method  is  likely  to  work  better  for 
complicated  problems  than  is  the  creeping  simplex  algorithm,  in  which  it  is  not  clear  how 
to  include  the  latter  kind  of  constraint. 

C.  NEURAL  NETWORKS 

This  is  a  very  new  area  of  research/ ^  Neural  networks,  which  consist  of  layers 
of  simulated  "neurons"  with  associated  activation  functions,  transfer  functions,  and 
weighting  functions  for  the  "synapse"  connections  to  other  neurons,  have  been 
shown  ’  ’  ’  to  be  capable  of  computing  decisions  in  optimization  problems.  Such 
networks  have  a  "learning"  capability  in  that  the  weights  associated  with  connections 
between  pairs  of  neurons  can  be  modified  (strengthened  or  weakened)  in  response  to  the 
network's  successes  and  failures  so  as  to  optimize  in  favor  of  the  network's  successful 
strategies.  This  is  probably  the  most  novel,  but  least  well  defined,  approach  to  the 
physical  problem  of  inverting  electron  transport  data.  Aarts  and  Korst  M  have  found  that 
on  some  graph  problems  the  neural  network  approach  is  from  20  to  400  times  faster  than 
the  simulated  annealing  method  described  above. 

One  kind  of  neural  network  consists  of  a  network  of  layers  of  simulated  neurons  as 
shown  in  Fig.  9  (taken  from  Ref.  31)  The  key  elements  are  an  input  layer,  one  or  more 
"hidden"  layers,  and  an  output  layer.  Each  neuron  has  a  transfer  function  associated  with 
it  that  gives  an  output  value  that  is  some  non-linear  function  of  the  sum  of  the  input 
values  and  each  pair  of  neurons  has  a  weight  value  associated  with  it.  The  concept  behind 
this  kind  of  network  (feed— forward,  back-propagation )  is  that  it  can  "learn"  to  associate  a 
set  of  output  patterns  with  a  set  of  input  patterns  by  adjusting  the  weights  that  connect 
together  the  network  of  non-linear  devices.  The  usual  transfer  function  used  in  such 
networks  is  the  sigmoid  T(x)=l/(l+e  x)  [there  is  an  equivalent  arctan  function  also].  If 
the  output  of  the  j1*1  neuron  is  o.  and  w..  is  the  weight  connecting  neurons  i  and  j,  then  the 
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Figure  9:  Two-layer  backpropayation  neural  network 


output  of  the  i1*1  neuron  is: 

Oj  =  l/d  +  c-£wij°j). 

The  network  is  trained  by  running  a  number  of  cases  of  (input, output)  sets  through  it  and 
adjusting  the  weights  to  minimize  a  the  sum  of  the  squares  of  the  differences  between  the 
desired  result  and  the  computed  result.  This  quadratic  function  is  the  so-called  energy , 
east,  or  objective  function.  The  weights  are  adjusted  at  random  using  a  steepest  descents  or 
a  conjugate  gradient  algorithm.'*0  After  the  network  is  "trained"  it  can  be  run  on  other 
input  vectors  to  yield  ouput  vectors  that,  hopefully,  are  good  approximations  to  the  correct 
output. 

Application  to  the  Problem  of  Obtaining  Cross  Sections  from  Swarm  Data 

In  order  to  explore  the  feasibility  of  using  neural  networks  on  this  problem  we  have 

36 

been  working  with  a  commercial  neural  net  simulator  called  BRAINMAKER.  This  is 
one  of  a  number  of  such  programs  as  can  be  seen  from  the  list  recently  compiled  by  BYTE 
Magazine  (see  the  Appendix  of  this  document,  which  was  taken  from  Ref.  37).  We  wrote  a 
program  to  generate  cross  section  sets  of  the  form  o{t)  —  <r  /f%  where  oQ  and  p  are  chosen 

17  \  A 

from  uniform  random  numbers  in  (10  ‘,10  )  and  (0,1)  respectively,  and  then  compute 

for  a  range  of  E/K  tnc  distribution  function  f(r)  and  the  associated  drift  velocities,  v^,  and 
characteristic  energies,  I)///.  We  then  set  up  a  training  set  for  BRAINMAKER  that 
consisted  of  the  sets  (v{|)  and  {D//<}  for  ten  values  of  E/N  and  the  cross  section  <x(r)  at 
nine  energies  from  which  the  swarm  data  were  computed.  The  input  layer  of  the  network 
then  consists  of  20  neurons,  one  for  each  value  of  v(j(E/N)  or  D//<(E/N).  The  output  layer 
comprises  nine  neurons,  one  for  each  cross  section  point  <r(ej),  i=l  to  9.  The  network  has 
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two  hidden  layers  of  25  neurons  each.  In  summary  the  network  is  described  as  follows: 


Laver 

Neurons 

Weights 

1 

20 

0 

2 

25 

525 

3 

25 

650 

4 

9 

234 

The  training  tolerance  was  5  percent  meaning  that  for  the  network  to  be  acceptable  the 
cost/energy/objective  function  of  the  difference  between  the  {cr(c)}  defined  by  the  values  of 
the  output  neurons  and  the  "data"  given  to  the  network  as  part  of  the  training  pattern  had 
to  be  less  than  or  equal  to  0.05. 

Once  the  network  was  "trained"  we  gave  it  another  file  of  sets  {v^}  and  (D /pi) 
corresponding  to  different  sets  { cr( e^)}  computed  with  random  a f  and  p  to  see  what  it 
predicted  for  the  cross  sections.  These  results  are  shown  in  Fig.  10  for  the  three  best  (out 
of  11)  cases.  These  illustrate  several  things.  First,  the  results  denoted  by  the  circles  are 
very  good.  They  all  diverge  at  high  energy  because  the  highest  E/N  that  I  used  (3.0  Td) 
was  too  small  for  v^  and  D ///  to  be  adequately  sensitive  to  the  high  energy  part  of  the  cross 
section.  In  addition,  we  have  observed  that  the  results  for  large  cross  sections  are  generally 
better  than  the  results  for  small  cross  sections.  We  think  that  this  is  due  to  a  dynamic 
range  problem  with  BRAINMAKER  that  we  attribute  to  its  being  single  precision;  it  was 
not  really  designed  for  scientific  number  crunching.  This  problem  could,  perhaps,  be 
gotten  around  by  using  log[<r(f)/<70]  where  a  is  a  scale  cross  section  equal  to,  say,  1  ft 
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Figure  10:  Ba 


Application  to  a  Real  Gas 

In  order  to  test  the  feasibility  of  using  this  kind  of  neural  network  to  find  the 
pattern  in  the  mapping  between  {v^E/N),  D/p(E/N),  etc.}  and  fa(f)}  we  trained 
BRAINMAKER  on  25  sets  of  {v^(E/N),  D//x(E/N)}  data  for  cross  sections  of  the  form  * 

a(c)  =  a  P  where  -1  <  p  <  +1.  That  is,  we  have  some  cross  sections  that  increase  with 

VJ  4 

energy  and  some  that  decrease  with  energy.  We  then  constructed  an  input  set  for  Xe  with 
(v^(E/N)}  from  Hunter,  et  al.^  and  (D/p(E/N)}  from  Koizumi,  et  al.^  Unfortunately 
neither  paper  presented  both  drift  velocity  and  characteristic  energy  data.  This  particular 
network  consisted  of  three  layers: 

Layer  Neurons  Weights 

1  18  0 

2  20  380 

3  9  189 

Fig.  1 1  displays,  so  called,  Hinton  diagrams  of  the  weights  of  the  connections  between 
the  neurons  of  the  input  layer  and  the  hidden  layer  and  the  neurons  of  the  hidden  layer  and 
the  output  layer. 

The  cross  section  that  the  neural  network  returned  in  the  output  layer  for  Xe  in  the 

energy  range  around  the  Ramsauer  minimum  is  shown  in  Fig.  12  along  with  the  crm(f) 

38  30  jc 

from  Hunter,  et  al,  Koizumi,  et  al,  and  Frost  and  Phelps.  0  We  see  that  the  neural  t 

network  gives  a  respectable  estimate  of  the  cross  section  even  though  the  number  of  E/N 
values  is  small  and  the  energy  grid  is  very  coarse. 
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Figure  11:  Hinton  diagram  of  weights  of  connections  between  neurons 
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Figure  12:  Xe  momentum  transfer  cross  section 


A  Different  Neural  Network  Approach 

O  OO 

Jeffrey  and  Rosner  have  developed  a  neural  network  approach  to  finding  the 
solution  to  an  integral  equation  that  bears  further  study  for  application  to  this  problem. 
This  network  is  a  so-called  Hopfield  32,34,40-42  Qr  reCurrenj  network.  It  consists  of  only 
one  layer,  as  shown  in  Fig.  13  (taken  from  the  BRAINMAKER  documentation  °),  and  is 
not  trained  as  is  the  network  that  we  have  described  above.  Rather,  it  is  essentially  an 
iteration  algorithm  where  the  output  is  fed  back  to  modify  the  input.  The  mathematical 
description  is  developed  as  follows.  We  want  the  solution  q  to  the  integral  equation  g(y)  = 
/k(x,y)q(x)dx,  which  we  write  in  discrete  form  as  gj  =  Sk^qj.  If  we  write  the 
energy/cost/objective  function  as  a  goodness-of-fit  function 

H(q)  =  1/2S-,  (gf-gj)2 

and  define  I;  =  Ek.-gf  and  T;1  =  -£  k  .k  .  then 
J  ij°i  ij  a  cm  aj 

H(q)  =  — l/2Ej£j  T^q.  -  +  1/2E,  (gf  )2 

This  now  is  in  the  standard  form  investigated  by  Hopfield  where  q=(qp...,q^j)  is  regarded 
as  the  ouput  vector  of  a  network  of  N  neurons.  By  considering  dH/dt=S(^H/^)(dq^/dt) 
Jeffrey  and  Rosner  then  show  that  the  update  equation  for  the  q^  on  the  (n+1)*  iteration 
is 


This  gets  more  complicated  for  more  complicated  energy  functions.  In  even  the  most 
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Figure  13:  Hopfield  single  layer  recurrent  neural  network 


simple  form  of  our  problem  (no  inelastic  processes),  the  kernel  k(x,y)  is  also  a  function  of 
q(x).  That  is, 


vd(E/N) «  J{df0[f,<7-1(e).E/N]/d(}<T-1(f)«le 

This  does  not  prevent  us  from  trying  this  kind  of  iteration,  however.  One  aspect  of  this 
iterative  approach  that  one  must  watch  out  for  is  chaos,  as  it  is  known  that  recurrent 

A  O 

networks  can  be  chaotic. 


*• 
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SECTION  IV 

CONCLUSIONS  AND  SUGGESTIONS  FOR  FURTHER  RESEARCH 

We  have  explored  three  optimization  techniques  for  treating  the  inverse  problem  of 
obtaining  electron  collision  cross  sections  from  electron  transport  data.  These  methods 
were  (a)  the  downhill  or  creeping  simplex;  (b)  simulated  annealing;  and  (c)  neural 
networks.  We  devoted  the  greatest  amount  of  effort  to  methods  (a)  and  (c).  The  simplex 
method  was  straightforward  to  implement  and,  as  we  saw  above,  demonstrated  a  capability 
for  making  headway  on  this  problem.  Simulated  annealing  is  capable  of  solving  any 
minimization  problem  that  the  simplex  can  solve  and  probably  much  more.  It,  however, 
requires  substantial  further  development  and  may  require  computational  resources  beyond 
what  a  PC  can  currently  provide.  We  devoted  much  effort  to  investigating  the  neural 
network  approach  because  it  is  very  new  and  has  not  yet  had  much  application  to  the 
problems  of  applied  physics.  That  approach  also  has  demonstrated  some  capability  in 
addressing  the  problem  at  hand. 

The  paths  for  further  development  of  methods  (a)  and  (b)  are  apparent  and  have 
been  discussed  above.  With  regard  to  (a),  another  possibilty  for  development  is  to 
implement  the  algorithm  developed  by  N.  Karmarkar  ^4,45  at&T  Bell  Laboratories  in 
1984.  It  has  been  claimed  ^  that  this  algorithm  is  much  faster  than  the  simplex.  A 
perusal  of  Science  Citation  Index,  however,  shows  that  Karmarkar's  algorithm  has  not  yet 
made  it  into  the  physics  literature. 

We  believe  that  the  neural  network  approach  too  is  worthy  of  further  exploration. 
The  limits  of  BRAINMAKER  to  this  application  have,  however,  about  been  exhausted. 

The  next  step  would  be  to  write  a  network  for  this  problem  with  larger  numbers  of 
neurons;  double  precision  arithmetic;  a  capability  for  having  different  transfer  functions  for 
different  layers;  allowing  different  convergence  criteria  for  different  energy  ranges;  and, 
perhaps,  using  the  Boltzmann  training  algorithm  ^L34,42  (an  application  of  simulated 
annealing  to  adjustment  of  the  weights  of  the  connections  in  the  network);  i.e.,  more 
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flexibility  in  general.  It  would  be  interesting  investigate  how  such  a  network  would 
perform  when  trained  on  a  completely  artificial  data  set  (probably  using  more  sophisticated 
functions  for  the  training  cross  sections  than  we  have  used  here)  as  compared  to  training  by 
feeding  it  a  large  set  of  data  on  real  atoms  and  molecules.  Ultimately  we  may  find  that  a 
neural  network  is  good  means  of  getting  a  rough  estimate  of  a  cross  section  <r(e)  that  can 
then  be  refined  using  another  numerical  optimization  algorithm.  The  conventional  wisdom 
has  been  that  neural  networks  are  useful  for  only  very  rough  solutions  and  not  for  accurate 
scientific  calculations  but  some  authors,  such  as  Lapedes  and  Farber  refute  that  point  of 
view.  As  this  area  of  research  is  very  much  in  its  infancy,  we  can  expect  many  new 
developments  in  the  application  and  understanding  of  neural  networks  in  the  future. 
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NEURAL  NETWORKS:  THEORY  AND  PRACTICE  (BYTE  Magazine,  August  19S9) 

For  most  of  their  existence,  neural  networks  and  neural-network  simulations  have  been  solely  objects  of  university-based  research.  In  the  last 
few  years,  however,  researchers  and  others  have  founded  companies  dedicated  to  producing  commercial  products  based  on  neural-network 
technology.  To  reflect  both  the  academic  and  commercial  aspects  of  the  technology,  this  resource  guide  consists  of  two  parts.  The  In  Theory 
teaion  lists  books  and  articles  you  can  read  to  learn  more  about  neural  networks.  The  In  Practice  section  lists  some  of  the  available  neural- 
network  hardware  and  software  products,  listed  alphabetically  by  company  name. 
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The  Brain  Simulator... . S99 

Runs  under  MS-DOS 

Tutorial  software  for  neural  circuit 

design 

Abbot,  Foster  &  Hauserman 
44  Montgomery.  Fifth  Floor 
San  Francisco.  CA  94014 
(800)  562-0025 
(415)955-2711 
Inquiry  1181. 

N-NET 

MS-DOS  version . $895 

VAX/V  MS  version ....  starting  at  $2995 
Integrated  neural-network  development 
system:  uses  functional  link  net 
architecture 
AI  Ware,  Inc. 

11000  Cedar  Ave  ,  Suite  212 
Cleveland.  OH  44106 
(216)421-2380 

Inquiry  1182. 

BrainMaker . $99  95 

Runs  under  MS-DOS 
Neural-network  simulation  softwore; 
supports  five  types  of  nodes  and  can 
process  up  500.000  connections  per 
second 

California  Scientific  Software 
160  East  Montecito,  Suite  E 
Sierra  Madre,  CA  91204 
(818)  355-1094 
Inquiry  1183. 

Cognitron 

MS-DOS  Windows  and  Mac 


versions . $600 

INMOS  transputer  version . $1 800 


Neural-network/parallel-processing 
prototyping  and  delivery  system 
Cognitive  Software,  Inc. 

703  East  30th  St. 

Indianapolis,  IN  46205 
(317)924-9988 
Inquiry  1184. 

Connections . $87.95 

Runs  under  MS-DOS 
A  traveling-salesman  demo  modeled 
after  Hopfield  networks 
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NttWuriu . $79.95 

Runs  under  MS-DOS 
Elementary  introduction  to  neural 
networks 

DAIR  Computer  Systems 
3440  Kenneth  Dr. 

Palo  Alto.  CA  94303 
(415)494-7081 
Inquiry  1185. 

Savvy  Text  Retrieval  System 
Savvy  Signal  Recognition  System 
Suwy  Vision  Recognition  System 

(Call  for  pricing) 

Run  under  VAX/VMS.  MS-DOS, 
and  Unix 

Libraries  ofC  subroutines  that  use 
neural  technology  to  solve  real-world 
problems 

Excalibur  Technologies 
2300  Buena  Vista  SE 
Albuquerque.  NM  87106 
(505)  764-0081 

Inquiry  1186. 

ANZA . from  $7000 

AT-compatible  neural-network 
coprocessors;  includes  software  and 
programming  interface 

ANZA  Plus . from  $12,500 

AT-compatible  neural-network 
coprocessors 

ANZA  Pfus/VME . $24,950 

Neural-network  coprocessor  for  Sun 
workstations 

AXON . $1950 

A  neural-network  description  language 
Neural  Network  Development 

Toolkit . $3950 

For  ANZA  Plus  and  ANZA  Plus/VME 
systems 

Ports  C  programs  into  ANZA  Plus  and 
ANZA  Plus/VME  formats;  includes 


AXON 

ExpioreNet 

MS-DOS  version . $995 

Sun  version . $3950 


Stand-alone  neural-network  software 
HNC.  Inc. 

5501  Oberlin  Dr. 

San  Diego.  CA  92121 
(619)  546-8877 
Inquiry  1187. 

MD/219  Fuzzy  Set 

Comparator . $3g 

Hardware  implementation  of  Hopfteld 

neurons 

Micro  Devices 

5695B  Beggs  Rd. 

Orlando.  FL  32810 
(407)299-0211 
Inquiry  1188. 

N1000 

Neural-network  development  tools  for 

signal  and  image  processing  applications 
Including  80386  computer 

. boa  $19,000 

Support  package  only  (for  Sun-4, 
Sun-3,  IBM  PC  AT.  and  PS/2 
Model  50 . . . from  $7955 


N500 . from  $495 

Runs  on  IBM  PC  AT  and  PS/2  Model  50 

Single-unit  RCE  network  software 
Nestor,  Inc. 

1  Richmond  Sq. 

Providence.  RI 02906 
(401)331-9640 
Inquiry  1189. 


Awareness . $275 

Runs  under  MS-DOS 
Introduction  to  four  types  of  neural- 
network  paradigms 

Genesis . $1095 


Runs  under  MS-DOS 
Neural-network  development 
environment 
Neural  Systems 
2827  West  43rd  Ave. 

Vancouver,  BC  Canada  V6N  3H9 
(604)  263-3667 
Inquiry  1190. 

NeuralWorks  Explorer . $299 

Runs  under  MS-DOS 

An  introduction  and  tutorial  on  neural 

networks 

NeuralWorks  Profesrional  II 

MS-DOS  and  Macintosh 

versions . $1495 

Sun-3,  Sun-4,  and  Sun386i 

versions . $2995 

NeXT  and  INMOS  transputer 

versions . call  for  pricing 

Neural-network  development  system 

NeuralWorks  Designer  Puck . $1995 

MS-DOS  and  Sun  versions 

Links  Professional  11  networks  with  C 

programs 

NeuralWare,  Inc. 

103  Buckskin  Court 
Sewickley,  PA  15143 
(412)741-5959 
Inquiry  1191. 

MacBrain . $995 

Runs  on  Macintosh 

Lets  you  prototype  and  deliver  neural- 
network  applications 
HyperBrain 
(Comes  with  MacBrain) 

Toolkit  allows  you  to  build  neural- 
network  applications  nothin  HyperCard 
Neurix,  Inc. 

1  Kendall  Sq. ,  Suite  2200 
Cambridge,  MA  02139 
(617)577-1202 
Inquiry  1192. 

Owl  I,  n,  III . from  $349 

Libraries  of  modules  for  IBM  and 
compatibles  that  lets  you  define  and 
access  10  different  neural  networks 

Extension  Pack . $149 

Three  additonal  networks 

Olmsted  A  Watkins 

241 1  East  Valley  PVwy.,  Suite  294 

Escondido,  CA  92025 

(619)  746-2765 

Inquiry  1193. 


Intelligent  Pattern  Recognition 

Chips . $500 

Stores  a  1 000-by-64  matrix  of  weights 
and  multiplies  it  with  an  input  vector 
Oxford  Computer 
39  Old  Good  Hill  Rd. 

Oxford,  CT  06483 
(203)  881-0891 
Inquiry  1194. 

ANSim2.1 . $495 

Runs  under  MS-DOS 
13  neural-network  models 

ANSkit . $950 

Runs  under  MS-DOS 
Neural-network  development  system 

ANSpec . $2995 

Runs  under  MS-DOS 
Neural-network  specification  language 
Delta  Floating  Point 

Processor . $24  950 

Runs  on  IBM  PC.  AT.  PS/2s.  and 
Sun3S6i 

Neural-network  accelerator  boards 
Sigma  Neurocomputer 

Workstations . from  $31,500 

80386-based  systems  with  Delta 
Processor.  ANSkit.  Della  C.  Delta 
Macro,  and  ANSpec 
SAIC 

10260  Campus  Point  Dr. 

Mail  Stop  71 
San  Diego,  CA  92121 
(619)  546-6290 

Inquiry  1195. 

DENDROS-1 . $35 

Neural-network  chip  that  produces  the 
dot  product  of  the  inputs  and  the 
connection  weights  of  22  synapses 
DENDROS-1  Evaluation  Board  ...  $695 
Uses  eight  DENDROS-1  chips  to  create 
a  hardware-based  neural  network 
Syntonics  Systems.  Inc. 

20790  Northwest  Quail  Hollow  Dr. 
Portland.  OR  97229 
(503)293-8167 
Inquiry  1196. 

TRW  Mark  V  Neural  Processor 

Write  for  pricing  information 

Runs  on  VAX/VMS 

MC68O20-based  parallel-processing 

system  includes  tools  for  neural-network 

applications 

TRW 

Military  Electronics  &  Avionics  Div. 

One  Rancho  Carmel 
San  Diego,  CA  92128 
(619)  592-3482 
Inquiry  1197. 

NeuroSheil . $195 

Runs  under  MS-DOS 
Creates  neural-network  applications 
using  a  modified  back-propagation 
Ward  Systems  Group,  Inc. 

228  West  Patrick  St. 

Frederick.  MD  21701 
(301)662-7950 
Inquiry  1198. 

Irchtuer  in  iht  man*  guide  don  mot  indicate 
thm  BYTE  endona  or  recommendi  Other  the 
mdua  or  On  ca«w:i.  h  mddtnm.  ITTE 
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